Performance of many Natural Language Processing (NLP) systems have reached a plateau using existing techniques. There seems to be a general consensus that systems have to integrate semantic knowledge or world knowledge in one form or another in order to provide additional information required to improve the quality of results. But building adequate and large enough semantic resources is a difficult unsolved problem. In my work, I attack the problem of very large scale acquisition of semantic knowledge by exploiting natural language text available on the Internet. In particular, I concentrate on one problem: extracting is-a relations from a very large corpus (70 million web pages, 26 billion word corpus) downloaded from the Internet. Since the amount of data involved is greater by two orders of magnitude than published before, the algorithms designed had to be highly scalable. This was achieved by:
Using these algorithms, I extract is-a relations from text to build a huge table. These extracted relations are then evaluated by using a host of different applications.